Minimal Residual Approaches for Policy Evaluation in Large Sparse Markov Chains
نویسندگان
چکیده
We consider the problem of policy evaluation in a special class of Markov Decision Processes (MDPs) where the underlying Markov chains are large and sparse. We start from a stationary model equation that the limit of Temporal Difference (TD) learning satisfies, and develop a Robbins-Monro method consistently estimating its coefficients. Then we introduce the minimal residual approaches, which solve an approximate version of the stationary model equation. Incremental Least-squares temporal difference (iLSTD) is shown to be a special form of minimal residual approaches. We also develop a new algorithm called minimal residual (MR) algorithm whose step-size can be computed on line. We introduce the Compressed Sparse Row (CSR) format and reduce the complexity of MR to near that of TD. The advantages of the MR algorithm are that it has comparable data efficiency and computational efficiency to iLSTD, but does not require manual selection of step-size.
منابع مشابه
Empirical Bayes Estimation in Nonstationary Markov chains
Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical Bayes estimators for the transition probability matrix of a finite nonstationary Markov chain. The data are assumed to be of a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...
متن کاملEvaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes
Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded DNA virus. There were two approaches for prediction of each Markov Model parameter,...
متن کاملFast multilevel methods for Markov chains
This paper describes multilevel methods for the calculation of the stationary probability vector of large, sparse, irreducible Markov chains. In particular, several recently proposed significant improvements to the multilevel aggregation method of Horton and Leutenegger are described and compared. Furthermore, we propose a very simple improvement of that method using an over-correction mechanis...
متن کاملDistributed solving of Markov chains for computer network models
In this paper a distributed iterative GMRES algorithm for solving huge and sparse linear systems (that appear in the Markov chain analysis of queueing network models) is considered. It is implemented using the MPI standard on a collection of Linux machines and the emphasis is put upon the size of linear systems being solved and possibility of storing huge and sparse matrices as well as huge vec...
متن کاملPreconditioned Generalized Minimal Residual Method for Solving Fractional Advection-Diffusion Equation
Introduction Fractional differential equations (FDEs) have attracted much attention and have been widely used in the fields of finance, physics, image processing, and biology, etc. It is not always possible to find an analytical solution for such equations. The approximate solution or numerical scheme may be a good approach, particularly, the schemes in numerical linear algebra for solving ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008